{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Basic RNN\n",
    "- Objective: to understand basics of RNN & LSTM"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Recurrent Neural Networks\n",
    "- Feedforward neural networks (e.g. MLPs and CNNs) are powerful, but they are not optimized to handle \"sequential\" data\n",
    "- In other words, they do not possess \"memory\" of previous inputs\n",
    "- For instance, consider the case of translating a corpus. You need to consider the **\"context\"** to guess the next word to come forward\n",
    "\n",
    "<img src=\"http://2.bp.blogspot.com/-9GIdV292xV4/UwOIy6B6koI/AAAAAAAAHi4/X6UGlyHI-_U/s1600/tumblr_ms5qzpFY671r9nm7io1_500.gif\" style=\"width: 500px\"/>\n",
    "\n",
    "<br>\n",
    "- RNNs are suitable for dealing with sequential format data since they have **\"recurrent\"** structure\n",
    "- To put it differently, they keep the **\"memory\"** of earlier inputs in the sequence\n",
    "</br>\n",
    "<img src=\"http://www.wildml.com/wp-content/uploads/2015/09/rnn.jpg\" style=\"width: 600px\"/>\n",
    "\n",
    "<br>\n",
    "- However, in order to reduce the number of parameters, every layer of different time steps shares same parameters\n",
    "</br>\n",
    "\n",
    "<img src=\"http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png\" style=\"width: 600px\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "from sklearn.metrics import accuracy_score\n",
    "from keras.datasets import reuters\n",
    "from keras.preprocessing.sequence import pad_sequences\n",
    "from keras.utils import to_categorical"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# parameters for data load\n",
    "num_words = 30000\n",
    "maxlen = 50\n",
    "test_split = 0.3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "(X_train, y_train), (X_test, y_test) = reuters.load_data(num_words = num_words, maxlen = maxlen, test_split = test_split)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# pad the sequences with zeros \n",
    "# padding parameter is set to 'post' => 0's are appended to end of sequences\n",
    "X_train = pad_sequences(X_train, padding = 'post')\n",
    "X_test = pad_sequences(X_test, padding = 'post')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "X_train = np.array(X_train).reshape((X_train.shape[0], X_train.shape[1], 1))\n",
    "X_test = np.array(X_test).reshape((X_test.shape[0], X_test.shape[1], 1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "y_data = np.concatenate((y_train, y_test))\n",
    "y_data = to_categorical(y_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "y_train = y_data[:1395]\n",
    "y_test = y_data[1395:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(1395, 49, 1)\n",
      "(599, 49, 1)\n",
      "(1395, 46)\n",
      "(599, 46)\n"
     ]
    }
   ],
   "source": [
    "print(X_train.shape)\n",
    "print(X_test.shape)\n",
    "print(y_train.shape)\n",
    "print(y_test.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Vanilla RNN\n",
    "- Vanilla RNNs have a simple structure\n",
    "- However, they suffer from the problem of \"long-term dependencies\"\n",
    "- Hence, they are not able to keep the **sequential memory\" for long\n",
    "\n",
    "<img src=\"http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-SimpleRNN.png\" style=\"width: 600px\"/>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from keras.models import Sequential\n",
    "from keras.layers import Dense, SimpleRNN, Activation\n",
    "from keras import optimizers\n",
    "from keras.wrappers.scikit_learn import KerasClassifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def vanilla_rnn():\n",
    "    model = Sequential()\n",
    "    model.add(SimpleRNN(50, input_shape = (49,1), return_sequences = False))\n",
    "    model.add(Dense(46))\n",
    "    model.add(Activation('softmax'))\n",
    "    \n",
    "    adam = optimizers.Adam(lr = 0.001)\n",
    "    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])\n",
    "    \n",
    "    return model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "model = KerasClassifier(build_fn = vanilla_rnn, epochs = 200, batch_size = 50, verbose = 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "model.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\r",
      " 50/599 [=>............................] - ETA: 0s"
     ]
    }
   ],
   "source": [
    "y_pred = model.predict(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "y_test_ = np.argmax(y_test, axis = 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.74958263773\n"
     ]
    }
   ],
   "source": [
    "print(accuracy_score(y_pred, y_test_))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Stacked Vanilla RNN\n",
    "- RNN layers can be stacked to form a deeper network\n",
    "\n",
    "<img src=\"https://lh6.googleusercontent.com/rC1DSgjlmobtRxMPFi14hkMdDqSkEkuOX7EW_QrLFSymjasIM95Za2Wf-VwSC1Tq1sjJlOPLJ92q7PTKJh2hjBoXQawM6MQC27east67GFDklTalljlt0cFLZnPMdhp8erzO\" style=\"width: 500px\"/>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def stacked_vanilla_rnn():\n",
    "    model = Sequential()\n",
    "    model.add(SimpleRNN(50, input_shape = (49,1), return_sequences = True))   # return_sequences parameter has to be set True to stack\n",
    "    model.add(SimpleRNN(50, return_sequences = False))\n",
    "    model.add(Dense(46))\n",
    "    model.add(Activation('softmax'))\n",
    "    \n",
    "    adam = optimizers.Adam(lr = 0.001)\n",
    "    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])\n",
    "    \n",
    "    return model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "model = KerasClassifier(build_fn = stacked_vanilla_rnn, epochs = 200, batch_size = 50, verbose = 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "model.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "500/599 [========================>.....] - ETA: 0s"
     ]
    }
   ],
   "source": [
    "y_pred = model.predict(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.746243739566\n"
     ]
    }
   ],
   "source": [
    "print(accuracy_score(y_pred, y_test_))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. LSTM\n",
    "- LSTM (long short-term memory) is an improved structure to solve the problem of long-term dependencies\n",
    "\n",
    "<img src=\"http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png\" style=\"width: 600px\"/>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from keras.layers import LSTM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def lstm():\n",
    "    model = Sequential()\n",
    "    model.add(LSTM(50, input_shape = (49,1), return_sequences = False))\n",
    "    model.add(Dense(46))\n",
    "    model.add(Activation('softmax'))\n",
    "    \n",
    "    adam = optimizers.Adam(lr = 0.001)\n",
    "    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])\n",
    "    \n",
    "    return model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "model = KerasClassifier(build_fn = lstm, epochs = 200, batch_size = 50, verbose = 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "model.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "500/599 [========================>.....] - ETA: 0s"
     ]
    }
   ],
   "source": [
    "y_pred = model.predict(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.844741235392\n"
     ]
    }
   ],
   "source": [
    "# accuracy improves by adopting LSTM structure\n",
    "print(accuracy_score(y_pred, y_test_))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Stacked LSTM\n",
    "- LSTM layers can be stacked as well"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def stacked_lstm():\n",
    "    model = Sequential()\n",
    "    model.add(LSTM(50, input_shape = (49,1), return_sequences = True))\n",
    "    model.add(LSTM(50, return_sequences = False))\n",
    "    model.add(Dense(46))\n",
    "    model.add(Activation('softmax'))\n",
    "    \n",
    "    adam = optimizers.Adam(lr = 0.001)\n",
    "    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])\n",
    "    \n",
    "    return model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "model = KerasClassifier(build_fn = stacked_lstm, epochs = 200, batch_size = 50, verbose = 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "model.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "500/599 [========================>.....] - ETA: 0s"
     ]
    }
   ],
   "source": [
    "y_pred = model.predict(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.858096828047\n"
     ]
    }
   ],
   "source": [
    "print(accuracy_score(y_pred, y_test_))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
